Enhancing audio speech using visual speech features

نویسندگان

  • Ibrahim Almajai
  • Ben P. Milner
چکیده

This work presents a novel approach to speech enhancement by exploiting the bimodality of speech and the correlation that exists between audio and visual speech features. For speech enhancement, a visually-derived Wiener filter is developed. This obtains clean speech statistics from visual features by modelling their joint density and making a maximum a posteriori estimate of clean audio from visual speech features. Noise statistics for the Wiener filter utilise an audio-visual voice activity detector which classifies input audio as speech or nonspeech, enabling a noise model to be updated. Analysis shows estimation of speech and noise statistics to be effective with human listening tests measuring the effectiveness of the resulting Wiener filter.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Audio and Visual Processing to Enhance User Interfaces

SUMMARY This report details the work carried out between the months of October 1996 and June 1997, and the results so far achieved. Numerous speech and person recognition experiments have been performed using both speech and visual lip features. The discriminatory properties of audio and visual features are examined, along with the performance of two classiiers, namely VQ and DTW. The eeect of ...

متن کامل

Recognition of isolated words using Zernike and MFCC features for audio visual speech recognition

Automatic Speech Recognition (ASR) by machine is an attractive research topic in signal processing domain and has attracted many researchers to contribute in this area. In recent year, there have been many advances in automatic speech reading system with the inclusion of audio and visual speech features to recognize words under noisy conditions. The objective of audio-visual speech recognition ...

متن کامل

A system for audio-visual speech recognition

In this work, a system of audio visual speech recognition will be presented. A new hybrid visual feature combination, which is suitable for audio -visual speech recognition was implemented. The features comprise both the shape and the appearance of lips, the dimensional reduction is applied using discrete cosine transform (DCT). A large visual speech database of the German language has been ass...

متن کامل

Multi-Modal Hybrid Deep Neural Network for Speech Enhancement

Deep Neural Networks (DNN) have been successful in enhancing noisy speech signals. Enhancement is achieved by learning a nonlinear mapping function from the features of the corrupted speech signal to that of the reference clean speech signal. The quality of predicted features can be improved by providing additional side channel information that is robust to noise, such as visual cues. In this p...

متن کامل

Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images

This paper proposes an audio-visual speech recognition method using lip information extracted from side-face images as an attempt to increase noise robustness in mobile environments. Our proposed method assumes that lip images can be captured using a small camera installed in a handset. Two different kinds of lip features, lip-contour geometric features and lip-motion velocity features, are use...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009